-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add drive download utility functions and test #462
base: main
Are you sure you want to change the base?
Conversation
Apophenia
commented
Nov 27, 2024
•
edited
Loading
edited
- Adds utility functions for use in moving drive files to s3
from processes.util.google_integration import get_drive_file | ||
|
||
def test_get_drive_file(): | ||
test_id = os.environ['EXAMPLE_FILE_ID'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using a test file ID in my local-compose. Could we upload a small file to test with, assuming the UUID of a drive file is otherwise anonymous (e.g. doesn't expose anything else about our account info or structure)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard-coded now, I added a test folder & a small PDF.
processes/util/google_integration.py
Outdated
|
||
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: only need one new line at the end of the file!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, thanks, vs code was no longer linting on save!
processes/util/google_integration.py
Outdated
logger.warning(f"HTTP error occurred when downloading Drive file {file_id}: {error}") | ||
file = None | ||
except Exception as err: | ||
logger.warning(f"Unexpected {type(err)=} occurred when downloading drive file {file_id}: {error}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.exception works well in this case to capture the error type and message
processes/util/google_integration.py
Outdated
return build('drive', 'v3', credentials=credentials) | ||
|
||
def get_drive_file(file_id: str) -> BytesIO: | ||
drive_service = create_drive_service(service_account_info=json.loads(SERVICE_ACCOUNT_FILE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the use of functions rather than a class here. Could we create the drive_service at the top level of this file so it's used more like a singleton?
processes/util/google_integration.py
Outdated
|
||
except HttpError as error: | ||
logger.warning(f"HTTP error occurred when downloading Drive file {file_id}: {error}") | ||
file = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can return early here and just have return None
processes/util/google_integration.py
Outdated
downloader = MediaIoBaseDownload(file, request) | ||
|
||
done = False | ||
while done is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while not done
I think may read better here!
processes/util/google_integration.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think putting this in the services folder instead is more explicit - e.g. services/google_drive_service.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think it's probably better this to refactor this into a service structure and move it there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good - nice work! just a few minor comments.
scopes = ['https://www.googleapis.com/auth/drive'] | ||
credentials = Credentials.from_service_account_info(service_account_info, scopes=scopes) | ||
|
||
drive_service = build('drive', 'v3', credentials=credentials) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this as functions (instead of encapsulating it in a class) based on your comments -- this breaks from the other files in /services but I can see an argument for doing it either way. Let me know if this is what you were envisioning.